A Peek at Correlations and Influences

Now that we have imported the raw data into a nice hdf5 data storage and have a nice spike detector, we can take a look at actual known network structures and try to visualize what actually causes them to spike, and what doesn't.

We start by doing the usual stuff, and by defining the methods we will need to detect spikes and create nice animations using matplotlib.

The code below is copied from the second worksheet (Spike Detection)


In [2]:
# Some core imports
import os
import sys
from subprocess import call
import time
import pandas
import numpy as np
import h5py

import theano
import theano.tensor as T
import theano.tensor.nnet as tnn
import matplotlib.pyplot as plt


# These are IPython specific
from IPython.display import display, clear_output
%matplotlib inline
%load_ext cythonmagic

In [3]:
# Core configuration

datadir = "/data/quick/connectomics"
mirror = "http://www.causality.inf.ethz.ch/data/connectomics"
global nbdir
if (not 'nbdir' in globals()):
    nbdir = os.getcwd()
os.chdir(datadir)

In [4]:
store = h5py.File(datadir + '/store.h5')
print store.keys()


[u'highcc', u'highcon', u'lowcc', u'normal-1', u'normal-2', u'normal-3', u'normal-4', u'normal-4-lownoise', u'small-1', u'small-2', u'small-3', u'small-4', u'small-5', u'small-6', u'test', u'valid']

In [5]:
net = store['small-1'] # We will use this network for initial testing purposes
print net.keys()


[u'connectionMatrix', u'fluorescence', u'networkPositions', u'signalDistanceMatrix', u'spatialDistanceMatrix', u'spikes']

In [6]:
import matplotlib.animation as manimation
from IPython.display import HTML, FileLink
from base64 import b64encode
import matplotlib

matplotlib.verbose.set_level('silent')

def display_video_inline(filename):
    video = open(filename, "rb").read()
    video_encoded = b64encode(video).decode('ascii')
    video_tag = '<video controls alt="Video" src="data:video/x-m4v;base64,{0}"/>'.format(video_encoded)
    return HTML(data=video_tag)

def display_video(filename):
    link = FileLink(os.path.abspath(filename))
    video_tag = '<video controls alt="Video" src="{0}{1}" /><br/><a href="{0}{1}">Source</a>'.format(link.url_prefix,link.path)
    return HTML(data=video_tag)

def display_video_link(filename):
    link = FileLink(os.path.abspath(filename))
    video_tag = '<a href="{0}{1}">{1}</a>'.format(link.url_prefix,link.path)
    return HTML(data=video_tag)
    
video_dpi = 100
FFMpegWriter = manimation.writers['ffmpeg']
metadata = dict(title='Connectomics Visualization Video', artist='K. Londenberg')

Network Structure

Now that we have a spike detector, we can take a look at the network structure. Let's first see how connectivity and neuron distance correlates with each other. Luckily, we already have the corresponding distance matrices prepared.


In [7]:
print net.keys()
net = store['normal-1']


[u'connectionMatrix', u'fluorescence', u'networkPositions', u'signalDistanceMatrix', u'spatialDistanceMatrix', u'spikes']

In [11]:
spd = net['signalDistanceMatrix'].value
spdflat = pandas.Series(spd.reshape((spd.shape[0]*spd.shape[1])))
spatial = net['spatialDistanceMatrix'].value
for v in sorted(spdflat.unique()):
    # the histogram of the data
    if (v==0.0):
        continue
    selector = (sdd==v)
    f = pandas.Series(spatial[selector].flatten())
    avg = f.mean()
    med = f.median()
    if (len(f)<10):
        continue
    plt.xlim(0.0, 1.5)
    plt.ylim(0.0, 1.5)
    plt.text(1.0,1.4, "Mean=%f" % (avg))
    plt.text(1.0,1.3, "Median=%f" % (med))
    n, bins, patches = plt.hist(f, np.linspace(0.0, 1.5, 50), normed=True, cumulative=False, facecolor='g', alpha=0.75)
                                                        
    plt.xlabel('Distance')
    plt.ylabel('Counts')
    plt.title('Neuron Distance for Signal Distance %r' % (v))
    plt.show()


Conclusions so far

Neuron distance and signal distance don't correlate in any significant way here, this is pretty obvious. So we won't take these into account.

Next Step

For every neuron spike we look back (and ahead) in a time window and record for every other neuron if these other neurons also spiked at certain points in time. We aggregate this info by spike, and group the "incoming" spikes by signal distance.

If you didn't understand that, it's ok. The visualization will make it much clearer.


In [1]:
def spike_causality_counts(net, intervals, do_full_trace=True, spatial_distance_range=None):
    sdd = net['signalDistanceMatrix'].value
    fluor = net['fluorescence'].value
    sddflat = pandas.Series(sdd.flatten())
    signal_distances = sorted(sddflat.unique())
    spikes = net['spikes'].value
    ncount = fluor.shape[0]
    maxtime = fluor.shape[1]
    from sys import stdout
    print spikes.shape
    spike_locations = spikes.nonzero()
    # Calculate max/min range of interest
    mini = intervals[0][0]
    maxi = intervals[0][1]
    for (mn,mx) in intervals:
        assert mn<mx
        mini = min(mn, mini, mx)
        maxi = max(mn, maxi, mx)
        print "mx=%d mn=%d -> mini=%d ; maxi=%d" % (mn,mx, mini, maxi)
    if (spatial_distance_range is not None):
        spatial = net['spatialDistanceMatrix'].value
        
    assert maxi>mini
    print "-> mini=%d ; maxi=%d" % (mini, maxi)
    result = np.zeros((1+len(signal_distances), len(intervals)), dtype=np.float64)
    result_sample_size = np.zeros((1+len(signal_distances), len(intervals)), dtype=np.int32)
    nlen = maxi-mini+1
    ssizes = [0.0]*(1+len(signal_distances))
    if (do_full_trace):
        spike_trace = np.zeros((1+len(signal_distances), len(spike_locations[0]), nlen))
        maxfluor_trace = np.zeros((1+len(signal_distances), len(spike_locations[0]), nlen))
        
    for spi in range(len(spike_locations[0])):
        
        neuron = spike_locations[0][spi]
        t = spike_locations[1][spi]
        
        assert spikes[neuron,t] == 1.0
        #clear_output()
        if (spi % 1000==0):
            #clear_output()
            print "Counting causing spikes of neuron %d at time %d: %d of %d " % (neuron, t, spi, len(spike_locations[0]))
            stdout.flush()
            
        sdslice = sdd[:,neuron].flatten() # All neurons signaling to this one
        for i in range(1+len(signal_distances)):
            selector = None
            if (i==0):
                src_selector = np.ones((ncount), dtype=np.int32)
            else:
                # Select all neurons which can signal with distance signal_distance[i+1] to current neuron
                src_selector = (sdslice==signal_distances[i-1])
            src_ncount_nr = float(np.sum(src_selector))
            
            if (spatial_distance_range is not None):
                c1 = spatial[neuron,:] >= spatial_distance_range[0]
                c2 = spatial[neuron,:] < spatial_distance_range[1]
                src_selector = src_selector & c1 & c2
            src_ncount = float(np.sum(src_selector))
            ssizes[i] += src_ncount
            if (src_ncount==0.0):
                if (i==1):
                    spike_trace[i,spi,-maxi-1]=1.0
                continue
            # Select all selected neurons spikes in region of interest around the current spike
            #print "Time = %d, mini=%d, maxi=%d" % (t, mini, maxi)
            soff = max(0, min(t+mini, maxtime))
            eoff = max(0, min(t+maxi+1, maxtime))
            if (eoff-soff==0):
                if (i==1):
                    print "Time = %d, mini=%d, maxi=%d - soff=%d, eoff=%d, maxtime=%d" % (t, mini, maxi, soff, eoff, maxtime)
                    spike_trace[i,spi,-maxi-1]=1.0
                continue
            
            rs = spikes[src_selector,slice(soff,eoff)]
            
            if (i==1):
                assert np.all(rs[:,-maxi-1]==1.0)
            
            if (rs.shape[1]<nlen):
                rspikes = np.zeros((rs.shape[0], nlen), dtype=np.float32)
                rspikes[:,-rs.shape[1]:] = rs
            else:
                rspikes = rs
                
            # And sum up their counts at the corresponding time slices
            rmean = np.mean(rspikes, axis=0)
            if (i==1):
                assert rmean[-maxi-1]==1.0
                
            mlen = rmean.shape[0]
            
            if (do_full_trace):
                rf = fluor[src_selector,slice(soff,eoff)]
                if (rf.shape[1]<nlen):
                    rfluor = np.zeros((rf.shape[0], nlen), dtype=np.float32)
                    rfluor[:,-rf.shape[1]:] = rf
                else:
                    rfluor = rf
                
                rmfluor = np.mean(rfluor, axis=0)
                assert len(rmean)==nlen
                spike_trace[i,spi,:] = rmean
                maxfluor_trace[i,spi,:] = rmfluor
                
            for iidx,(mn,mx) in enumerate(intervals):
                mi = min(max(0,mlen-1+mn), mlen)
                ma = min(max(0,mlen-1+mx), mlen)
                if (ma>mi):
                    rs = float(np.mean(rspikes[mi:ma]))
                    result[i, iidx] += rs 
                    result_sample_size[i,iidx] += 1
    print "Sample Sizes were %r" % (ssizes)
    if (do_full_trace):
        return (np.log(result)-np.log(result_sample_size), signal_distances, spike_trace, maxfluor_trace)
    
    return (result/result_sample_size, signal_distances)

In [13]:
spks = spike_causality_counts(store['normal-4-lownoise'], [(-10, 10),(-200,-1),(-10,0),(-50,-11),(-200,-51)])


(1000, 179500)
mx=-10 mn=10 -> mini=-10 ; maxi=10

In [14]:
(result, signal_distances, spike_trace, fluor_trace) = spks

In [15]:
fluor_mean = np.mean(fluor_trace, axis=1)
spike_mean = np.mean(spike_trace, axis=1)
fluor_variance = np.var
for i in range(len(signal_distances)+1):
    # the histogram of the data
    fig = plt.Figure(figsize=(12,8))
    fluor = fluor_mean[i,:]
    spikes = spike_mean[i,:]
    #print fluor
    #print spikes
    #plt.plot(np.linspace(-len(fluor),0.0, len(fluor)), fluor/float(np.max(fluor)), color='g', alpha=0.3)
    plt.plot(np.linspace(-len(spikes),0.0, len(spikes)), np.sqrt(spikes), color='r', alpha=1.0)
    
    plt.xlabel('t')
    if (i==0):
        dist = "All"
    else:
        dist = signal_distances[i-1]
    plt.title('Fluorescence and spikes for signal distance %r' % (dist))
    plt.show()


Interpretation

The graphs above are pretty interesting. They lead to a few conclusions:

  • Correlated Spikes seem to occur almost simultaneously.
  • Spikes of causally disconnected neurons also happen almost simultaneously. This is probably due to confounding and spikes propagating through the network.
  • The number of hops a signal needs to travel seem to affect the timing.
  • Seeing a neuron spike in the "silent" range roughly 10 to 60 time steps before the reference neuron's spike leads to the almost certain conclusion that the neuron doesn't cause the reference neuron to fire.

    Let's zoom in to the interesting details now.


In [17]:
fluor_mean = np.mean(fluor_trace, axis=1)
spike_mean = np.mean(spike_trace, axis=1)
fluor_stddev = np.std(fluor_trace, axis=1)

assert spike_mean[1,-11]==1.0
def plot_traces(title, ind, show_fluor, slc):
    sel = np.zeros((fluor_mean.shape[0]), dtype=np.int8)
    for i in ind:
        sel[i] = 1
    fig = plt.Figure(figsize=(16,10))
    print sel
    
    if (len(ind)>1):
        fluor = np.mean(fluor_mean[sel,-90:], axis=0)
        spikes = np.mean(spike_mean[sel,-90:], axis=0)
        std = np.mean(fluor_stddev[sel,-90:], axis=0)
    else:
        fluor = fluor_mean[ind[0],-90:]
        spikes = spike_mean[ind[0],-90:]
        std = fluor_stddev[ind[0],-90:]
        if (ind[0]==1):
            assert spike_mean[1,-11]==1.0
            assert spikes[-11]==1.0
        
    plt.ylim(-0.1, 1.2)
    #plt.plot(np.linspace(-len(spikes)+11.0,11.0, len(spikes)), spikes, color='b', alpha=1.0)
    plt.plot(np.linspace(-len(spikes)+11.0,11.0, len(spikes)), spikes**0.3, color='r', alpha=0.4)
    if (show_fluor):
        plt.plot(np.linspace(-len(spikes)+11-0,11.0, len(spikes)), fluor, color='g', alpha=1.0)
        plt.plot(np.linspace(-len(spikes)+11-0,11.0, len(spikes)), fluor+std, color='g', alpha=0.6)
        plt.plot(np.linspace(-len(spikes)+11-0,11.0, len(spikes)), fluor-std, color='g', alpha=0.6)
        plt.fill_between(np.linspace(-len(spikes)+11-0,11.0, len(spikes)), fluor+std, 
                     fluor-std, facecolor='g', alpha=0.3)
    
    plt.title(title)
    plt.show()

In [19]:
plot_traces('All Neurons', [0], False, slice(-90))
plot_traces('Reference Neuron', [1], True, slice(-200))
plot_traces('Connected to Reference, 1 Hop ', [2], False, slice(-90))
plot_traces('Connected to Reference, 2 Hops ', [3], False, slice(-90))
plot_traces('Connected to Reference, 3 Hops ', [4], False, slice(-90))
plot_traces('Connected to Reference, 4 Hops ', [5], False, slice(-90))
plot_traces('Connected to Reference, 5 Hops ', [6], False, slice(-90))
plot_traces('Connected to Reference, 6 Hops ', [7], False, slice(-90))
plot_traces('Connected to Reference, Infinite Hops', [8], False, slice(-90))


[1 0 0 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0 0]
[0 0 1 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0]
[0 0 0 0 0 1 0 0 0]
[0 0 0 0 0 0 1 0 0]
[0 0 0 0 0 0 0 1 0]
[0 0 0 0 0 0 0 0 1]

Thoughts so far

Interesting, indeed. It should be possible to use the correlation within a very tight range around the time of the spike as a feature for training a classifier to predict the signal distance. But we need to take a few more factors into account:

  • Spatial distance to the correlated neurons. Maybe distance doesn't correlate with connectivity, but it might correlate with measured spikes due to light diffusion. Luckily, this is easy to check.
  • Confounding Correcting for confounding is hard in this context. We might do it with conditional independence tests, but these can get intractable soon. We will probably have to cluster neurons into strongly correlated groups, and then perform conditional indepependence tests for each triplet within. We might use the PC or PC* Algorithm here.

Next Step

Because it's easy, we start with checking whether spatial distance affects measured spikes due to diffusion. If it does, we might have to change our spike detector to be less sensitive to these effects.


In [422]:
net = store['normal-4-lownoise']
spatial = net['spatialDistanceMatrix'].value
sps = pandas.Series(spatial.flatten())

spks_near = spike_causality_counts(net, [(-10, 10),(-200,-1),(-10,0),(-50,-11),(-200,-51)], True, (0.0, sps.quantile(0.3)))
spks_far = spike_causality_counts(net, [(-10, 10),(-200,-1),(-10,0),(-50,-11),(-200,-51)], True, (sps.quantile(0.7), float('inf')))

In [425]:
(result, signal_distances, spike_trace, fluor_trace) = spks_near

fluor_mean = np.mean(fluor_trace, axis=1)
spike_mean = np.mean(spike_trace, axis=1)
fluor_stddev = np.std(fluor_trace, axis=1)

plot_traces('Near: All Neurons', [0], False, slice(-90))
#plot_traces('Near: Sig1 Neurons', [2], False, slice(-90))
#plot_traces('Near: Sig2 Neurons', [3], False, slice(-90))
#plot_traces('Near: SigInf Neurons', [8], False, slice(-90))

(result, signal_distances, spike_trace, fluor_trace) = spks_far

fluor_mean = np.mean(fluor_trace, axis=1)
spike_mean = np.mean(spike_trace, axis=1)
fluor_stddev = np.std(fluor_trace, axis=1)

plot_traces('Far: All Neurons', [0], False, slice(-90))
#plot_traces('Far: Sig1 Neurons', [2], False, slice(-90))
#plot_traces('Far: Sig2 Neurons', [3], False, slice(-90))
#plot_traces('Far: Sig3 Neurons', [4], False, slice(-90))
#plot_traces('Far: SigInf Neurons', [8], False, slice(-90))


[1 0 0 0 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0]

Interpretation

Light Diffusion to nearby neurons doesn't seem to have an effect on the detected spikes.

Spike Frequency Analysis

We should create a running statistic of spike distance or frequency and see what we can deduce from that. But first we take a look at average frequencies and how they are distributed and how that correlates with the connectivity of neurons.


In [491]:
spikes = net['spikes'].value
spikefreq = pandas.Series(np.mean(spikes, axis=1))
spikefreq.describe()
fig = plt.Figure(figsize=(12,8))
plt.hist(spikefreq, 40, normed=True, log=True, color='g', alpha=0.8)
plt.show()


The Frequencies don't seem to follow any standard distribution. We should take more networks into account.


In [451]:
import networkx as nx

def create_nx_graph(net, digraph=True):
    ''' Load a neural network with known connectivity as a NetworkX directed graph'''
    if (not 'connectionMatrix' in net):
        return None
    if (digraph):
        G = nx.DiGraph() 
    else:
        G = nx.Graph() 
    conn = net['connectionMatrix']
    for i in range(conn.shape[0]):
        G.add_node(i)
    for i in range(conn.shape[0]):
        for j in range(conn.shape[0]):
            if (conn[i,j]>0.0):
                G.add_edge(i,j)
    return G

G = create_nx_graph(net, True)
print G




In [447]:
print spikefreq.describe()
lowfreq_neurons = np.nonzero(spikefreq<=spikefreq.quantile(0.003))[0]
highfreq_neurons = np.nonzero(spikefreq>=spikefreq.quantile(1.0-0.003))[0]
print lowfreq_neurons
print highfreq_neurons


count    1000.000000
mean        0.002031
std         0.000213
min         0.000396
25%         0.001961
50%         0.001972
75%         0.002050
max         0.003226
dtype: float64
[ 75 231 325]
[ 14  72 444 818 886 909]

In [475]:
indegrees = pandas.Series(np.array([float(G.in_degree(i)) for i in range(len(spikefreq))], dtype=np.float32))
outdegrees = pandas.Series(np.array([float(G.out_degree(i)) for i in range(len(spikefreq))], dtype=np.float32))
ds = pandas.DataFrame({ 'spikefreq' : spikefreq, 'indegree' : indegrees, 'outdegree' : outdegrees})


[16, 12, 18, 11, 13, 13]

In [498]:
print '''Pearson Correlation Coefficients between average spike frequency and
    Indegree: %f
    Outdegree: %f 
    ''' % (ds.spikefreq.corr(ds.indegree, method='pearson'),
           ds.spikefreq.corr(ds.outdegree, method='pearson'))


Pearson Correlation Coefficients between average spike frequency and
    Indegree: 0.188110
    Outdegree: -0.035427 
    

Observations so far

We have a notable correlation between average spike frequency and indegree of a node. We should use that later on. Let's take a look at how the in- and outdegrees are distributed.


In [499]:
fig = plt.Figure(figsize=(12,8))
plt.hist(ds.indegree, 20, normed=True, color='g', alpha=0.8)
plt.title('Indegree distribution')
plt.show()

fig = plt.Figure(figsize=(12,8))
plt.hist(ds.outdegree, 20, normed=True, color='g', alpha=0.8)
plt.title('Outdegree distribution')
plt.show()



In [509]:
# Let's see how sparse the spikes are actually distributed

s = np.sum(spikes, axis=0)

spikelocs = np.array(np.nonzero(s)[0], dtype=np.int32)
spikenz = pandas.Series(s[spikelocs])
print len(spikelocs)
print spikes.shape[1]
print spikes.shape[1]/len(spikelocs)
x = pandas.Series(spikenz)
plt.Figure(figsize=(12,8))
plt.hist(x, 30, color='r', alpha=0.8)
plt.show()

spikenz.describe()


3506
179500
51
Out[509]:
count    3506.000000
mean      103.997718
std       219.123970
min         1.000000
25%         4.000000
50%        15.000000
75%        55.000000
max       899.000000
dtype: float64

Next Steps

We will count the number of spikes a neuron received from incoming neurons since it's last spike. We will create a set of exhaustive probability mass functions from that, each of which we will at first visualize as histograms.

Bayesian Modeling

Let $ NE $ be the set of neurons, with $ ||NE|| $ be the size of this set.

Let $ Pa(n) $ be the true set of parents of neuron $ n \in NE $, $ ||Pa(n)|| $ denotes the size of this set. We use the variable $ cap(n) $ to denominate a specific candidate set of parents for neuron n, which may be identical, partially identical or completely disjunct from the true set of parents Pa(n). We use $ ||cap|| $ to denote the size of this set.

We assume that the number of parents of a neuron is limited to be below a known upper bound $ max(Pa(NE)) $ - we won't consider sets of parents larger than this maximum.

Let $ Spikecount(j, cap(n), n) $ denote the observation $ j \in {1, ..., J_n } $ of the number of spikes of the neurons in set $ cap(n) $ leading up to spike $ j $ of neuron n, where $ J_n $ denotes the number of spikes of neuron n.

We are interested in the general probability mass function

$$ P( Pa(n) = cap(n) | Spikecount(j, cap(n), n) $$

We can't directly get this, so we use Bayes formula:

$$ P( Pa(n) = cap(n) | Spikecount(j, cap(n), n)) = \frac{P(Spikecount(j,cap(n), n))|Pa(n)=cap(n)) \cdot P(Pa(n)=cap(n))}{Z} $$

So let's attack these terms one by one:

Likelihood

$$ P(Spikecount(j,cap(n), n))|Pa(n)=cap(n)) $$

This is effectively the likelihood of the observation $ j $ given that $ cap(n) $ is actually the true set of parents $ Pa(n) $.

Let's assume all neurons with a given number of parents behave similarly: They have a minimum number of spikes of their parents that they require before they fire themselves.

$$ \forall j : Spikecount(j, Pa(n), n) >= K_n $$

If we assume cap(n) = Pa(n) we can simply estimate $ K_n$ to be the minimum of all observations, or very close to the minimum (say, the 1% quantile) to allow for some measurement error.

$$ {min}_j(Spikecount(j, Pa(n), n) = K_n $$

Let's further assume that we can describe the spikecount as a constant plus a stochastic error term which is determined by the number of parents, i.e. || Pa(n) ||.

$$ Spikecount(j, Pa(n), n) = K_n + E(||Pa(n)||) $$

We can learn the probability distribution of this error term E(||Pa(n)||) for all sizes of the parent set from our training data. So we have:

$$ AddSpikes(j,Pa(n), n) = Spikecount(j, Pa(n), n) - K_n $$$$ P(Spikecount(j,cap(n), n))|Pa(n)=cap(n)) = P (E(||cap(n)||)= AddSpikes(j,Pa(n), n)) $$

Prior

$$ P(Pa(n)=cap(n)) $$

This can be modeled as the sum of the probabilities of having a parent set of size $ || cap(n) || $ multiplied with the probability of having randomly drawn this specific subset of || cap(n) || neurons from all candidate parent sets of that size. Let $ numcap(n,k) $ be the number of candidate parent sets of size k for neuron n. Then:

$$ P(Pa(n)=cap(n)) = \sum_{i=0)}^{max(||Pa(NE)||)}{\frac{P(||Pa(n)||=||cap(n)||)}{ numcap(n,k) }} $$

Note that we can learn $ P(||Pa(n)||=||cap(n)||) $ from the data.

Normalization Term

$$ Z $$

Z is simply a normalization factor, whose exact meaning depends on context. We have to ensure the probability mass function integrates to 1 over all possible inputs.

Limiting the number of candidate parent sets

The above is a nice procedure to get probabilities once we have limited the number of possible parents for a neuron to a tractable number. The big question now is: How do we limit these ?

We have already seen above, that neurons which directly are direct parents of the reference neuron tend to fire more often at exactly same time slice. While the true difference from 1 hop to two hops might be just from 73% correlation to 63% or so, repeated measurements give a strong hint which neurons might be directly connected, and which are not.